CCG augmented hierarchical phrase-based machine translation
نویسندگان
چکیده
We present a method to incorporate target-language syntax in the form of Combinatory Categorial Grammar in the Hierarchical Phrase-Based MT system. We adopt the approach followed by Syntax Augmented Machine Translation (SAMT) to attach syntactic categories to nonterminals in hierarchical rules, but instead of using constituent grammar, we take advantage of the rich syntactic information and flexible structures of Combinatory Categorial Grammar. We present results on Chinese-English DIALOG IWSLT data and compare them with Moses SAMT4 and Moses Phrase-Based systems. Our results show 5.47% and 1.18% BLEU score relative increase over Moses SAMT4 and Phrase-Based systems, respectively. We conduct analysis on the reasons behind this improvement and we find out that our approach has better coverage than SAMT approach. Furthermore, Combinatory Categorial Grammar-based syntactic categories attached to nonterminals in hierarchical rules prove to be less sparse and can generalize better than syntactic categories extracted according to SAMT method.
منابع مشابه
CCG-Augmented Hierarchical Phrase-Based Statistical Machine Translation
xvii Acknowledgements xix
متن کاملExtending CCG-based Syntactic Constraints in Hierarchical Phrase-Based SMT
In this paper, we describe two approaches to extending syntactic constraints in the Hierarchical Phrase-Based (HPB) Statistical Machine Translation (SMT) model using Combinatory Categorial Grammar (CCG). These extensions target the limitations of previous syntax-augmented HPB SMT systems which limit the coverage of the syntactic constraints applied. We present experiments on Arabic–English and ...
متن کاملThe DCU machine translation systems for IWSLT 2011
In this paper, we provide a description of the Dublin City University’s (DCU) submissions in the IWSLT 2011 evaluation campaign.1 We participated in the Arabic-English and Chinese-English Machine Translation(MT) track translation tasks. We use phrase-based statistical machine translation (PBSMT) models to create the baseline system. Due to the open-domain nature of the data to be translated, we...
متن کاملSupertags as Source Language Context in Hierarchical Phrase-Based SMT
Statistical machine translation (SMT) models have recently begun to include source context modeling, under the assumption that the proper lexical choice of the translation for an ambiguous word can be determined from the context in which it appears. Various types of lexical and syntactic features have been explored as effective source context to improve phrase selection in SMT. In the present w...
متن کاملCCG Contextual Labels in Hierarchical Phrase-Based SMT
In this paper, we present a method to employ target-side syntactic contextual information in a Hierarchical Phrase-Based system. Our method uses Combinatory Categorial Grammar (CCG) to annotate training data with labels that represent the left and right syntactic context of target-side phrases. These labels are then used to assign labels to nonterminals in hierarchical rules. CCG-based contextu...
متن کامل